Exploiting ASP for Semantic Information Extraction

نویسندگان

  • Massimo Ruffolo
  • Nicola Leone
  • Marco Manna
  • Domenico Saccà
  • Amedeo Zavatto
چکیده

The paper describes HıLεX, a new ASP-based system for the extraction of information from unstructured documents. Unlike previous systems, which are mainly syntactic, HıLεX combines both semantic and syntactic knowledge for a powerful information extraction. In particular, the exploitation of background knowledge, stored in a domain ontology, allows to empower significantly the information extraction mechanisms. HıLεX is founded on a new two-dimensional representation of documents, and heavily exploits DLP– an extension of disjunctive logic programming for ontology representation and reasoning which has been recently implemented on top of DLV . The domain ontology is represented in DLP, and the extraction patterns are encoded by DLP reasoning modules, whose execution yields the actual extraction of information from the input document. HıLεX allows to extract information from both HTML and flat text documents.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

Exploiting Description Knowledge for Keyphrase Extraction

Keyphrase extraction is essential for many IR and NLP tasks. Existing methods usually use the phrases of the document separately without distinguishing the potential semantic correlations among them, or other statistical features from knowledge bases such as WordNet and Wikipedia. However, the mutual semantic information between phrases is also important, and exploiting their correlations may p...

متن کامل

Creating and Exploiting a Web of Semantic Data

Twenty years ago Tim Berners-Lee proposed a distributed hypertext system based on standard Internet protocols. The Web that resulted fundamentally changed the ways we share information and services, both on the public Internet and within organizations. That original proposal contained the seeds of another effort that has not yet fully blossomed: a Semantic Web designed to enable computer progra...

متن کامل

Experimenting with parallelism for the instantiation of ASP programs

In the last few years, microprocessor technologies have been moving towards multi-core architectures, in order to improve performance as well as reduce power consumption. This makes real Symmetric MultiProcessing (SMP) available even on nondedicated machines, and paves the way to the development of better performing software. Notably, the recent application of Answer Set Programming (ASP) in di...

متن کامل

Exploiting Syntactic and Semantic Information for Relation Extraction from Wikipedia

The exponential growth of Wikipedia recently attracts the attention of a large number of researchers and practitioners. One of the current challenge on Wikipedia is to make the encyclopedia processable for machines. In this paper, we deal with the problem of extracting relations between entities from Wikipedia’s English articles, which can straightforwardly be transformed into Semantic Web meta...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005